Participation Distribution in Committee Selection¶

Executive Summary¶

In the following computer experiments, we aim to understand the distribution of selections in a committee when varying sizes of the participant pool of SPOs and the committee. We show that the "pigeonhole principle" helps us interpret the results and understand the finite distribution of the committee seats assigned to participants as a function of stake, group, and committee sizes.

The experiment is designed to:

  • Sample without replacement a group of participants from the population and
  • Calculate the stake weight for each participant, which is the stake normalized over the group to sum to 1.
  • Assign a committee of the fixed group size based on the stake weight of each using random selection with replacement.
  • Analyze the relationship and distribution of committee selection with group size.

We conducted the experiments with varying sizes (100, 200, ..., 500) of groups and committees. The results are visualized through plots of committee assignments where we vary the group size to see how the committee selection and seat count changes.

The results show that some group members with smaller stake weights may not (ever?) get selected for committee seats. With repeated trials where a new committee is selected, called an epoch, and assuming nonzero stake weight, there is nonzero probability of selecting any participant in the long run. However, in the short term, there is a significant chance that some participants will not ever get selected, almost surely. This is a natural outcome of the selection process with a discrete and finite number of seats. This is a manifestation of this committee selection process as it currently stands.

In [ ]:
# %%

# Load the required libraries

from participation_lib import (
    np,
    pd,
    plt,
    sns,
    load_data,
    get_stake_distribution,
    assign_commitee,
    simulate,
    std_error,
    plot_group_to_committee_index,
    plot_selection_count_vs_stake,
    plot_committee_selection_counts,
    plot_committee_selection_seat_cutoff,
    plot_participation,
)
In [ ]:
# %%

# Load the Data: The population of registered SPOs

population = load_data("../data/pooltool-cleaned.csv")

print(population.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3056 entries, 0 to 3055
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             3056 non-null   object 
 1   stake          3056 non-null   int64  
 2   stake_percent  3056 non-null   float64
dtypes: float64(1), int64(1), object(1)
memory usage: 71.8+ KB
None
In [ ]:
# %%

population.describe()
Out[ ]:
stake stake_percent
count 3.056000e+03 3056.000000
mean 7.305314e+06 0.032723
std 1.648449e+07 0.073839
min 0.000000e+00 0.000000
25% 5.265000e+02 0.000002
50% 5.692500e+04 0.000255
75% 3.282500e+06 0.014703
max 1.054300e+08 0.472250
In [ ]:
# %%

# Let's now sample a group of participants from the population
# and calculate the stake weight for each participant.

group_size = 100

group_stakes = get_stake_distribution(
    population,
    group_size=group_size,
    num_iter=100,
    plot_it=True,
)
print(group_stakes)
No description has been provided for this image
          stake  stake_weight
0   71397500.00  8.753869e-02
1   67897900.00  8.324792e-02
2   64359600.00  7.890970e-02
3   60630500.00  7.433754e-02
4   55516400.00  6.806727e-02
..          ...           ...
95        17.54  2.150536e-08
96        10.50  1.287379e-08
97         5.98  7.331929e-09
98         3.50  4.291263e-09
99         1.75  2.145631e-09

[100 rows x 2 columns]
In [ ]:
# %%

print(group_stakes.describe())
              stake  stake_weight
count  1.000000e+02  1.000000e+02
mean   8.156108e+06  1.000000e-02
std    1.687489e+07  2.068988e-02
min    1.750000e+00  2.145631e-09
25%    1.981992e+03  2.430072e-06
50%    1.456142e+05  1.785339e-04
75%    5.105425e+06  6.259634e-03
max    7.139750e+07  8.753869e-02
In [ ]:
# %%

# Let's now assign a committee of the fixed group_size
# based on the stake weight of each

results = assign_commitee(
    group_stakes,
    committee_size=group_size,
    num_iter=1,
    plot_it=True,
)
No description has been provided for this image
In [ ]:
# %%

# Let's now create a plots of committee assignments where we vary
# the group size over {100, 200, 300, 400, 500} and see how the
# committee selection and seat count changes.

# Initialize Parameters:
# comm_sizes = [100]  # vary over committee size, k
# group_sizes = [100]  # vary over group size, n
comm_sizes = range(200, 501, 100)  # vary over committee size, k
group_sizes = range(200, 501, 100)  # vary over group size, n
num_iter = 1  # Number of iterations for Monte Carlo simulation

# Note that the number of iterations here can be interpreted as the number
# of selection rounds for the committee, which we call an epoch.
# If we have a new epoch per day, then 1000 iterations is about 3 years.
In [ ]:
# %%

# Call the function
sim_results_df = simulate(
    population,
    comm_sizes,
    group_sizes,
    num_iter,
    plot_it=True,
)
Committee Size = 200
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Committee Size = 300
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Committee Size = 400
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Committee Size = 500
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
In [ ]:
# %%

# Extract the data for plotting

col_index = sim_results_df.columns
commitee_sizes = [
    int(col.split("=")[1].strip()) for col in col_index.get_level_values(0).unique()
]
group_sizes = [
    int(col.split("=")[1].strip()) for col in col_index.get_level_values(1).unique()
]

# Plot the percentage of group participants not selected for committee seats
plot_participation(sim_results_df, commitee_sizes, group_sizes, num_iter)
No description has been provided for this image
In [ ]:
# %%

# Plot the committee selection counts distribution
fig = plt.figure(figsize=(12, 8))

plot_data = sim_results_df.loc["Committee Seats"].loc["mean"]

for c, g in plot_data.index:

    y = plot_data.loc[(c, g)]
    x = y.index

    n_c = int(c.split("=")[1].strip())
    n_g = int(g.split("=")[1].strip())

    colors = sns.color_palette("tab20", len(plot_data.index))
    color_idx = list(plot_data.index).index((c, g))
    plt.bar(x, y, alpha=0.7, color=colors[color_idx], label=f"{n_c}, {n_g}")

plt.xlabel("Participant Index")
plt.ylabel("Committee Seat Count (average)")
plt.title("Committee Seat Count for Participants")
plt.legend(title="Committee Size, Group Size")
plt.xlim(0, 200)
plt.show()
No description has been provided for this image
In [ ]:
# %%

# Distinct Voters
committee_voters = sim_results_df.loc["Distinct Voters"]

# Create a DataFrame row from the computed percentages
mean_values = committee_voters.loc["mean"]
std_dev_values = committee_voters.loc["sd"]

# Calculate the percentage of participants not selected for committee seats
print("Percentage of Group Participants Not Selected for Committee Seats:")
committee_participation = pd.concat([mean_values, std_dev_values], axis=1)
# committee_participation.columns = ["Mean", "Std Dev"]

print(committee_participation)
Percentage of Group Participants Not Selected for Committee Seats:
                                        mean   sd
Committee Size       Group Size                  
Committee Size = 200 Group Size = 200   52.0  0.0
                     Group Size = 300   64.0  0.0
                     Group Size = 400   80.0  0.0
                     Group Size = 500   92.0  0.0
Committee Size = 300 Group Size = 200   57.0  0.0
                     Group Size = 300   75.0  0.0
                     Group Size = 400   84.0  0.0
                     Group Size = 500   99.0  0.0
Committee Size = 400 Group Size = 200   58.0  0.0
                     Group Size = 300   76.0  0.0
                     Group Size = 400  103.0  0.0
                     Group Size = 500  121.0  0.0
Committee Size = 500 Group Size = 200   65.0  0.0
                     Group Size = 300   81.0  0.0
                     Group Size = 400  101.0  0.0
                     Group Size = 500  127.0  0.0
In [ ]:
# %%

# Let's now create a plots of committee assignments where we vary
# the group size over {100, 200, 300, 400, 500} and see how the
# committee selection and seat count changes.

# Initialize Parameters:
# comm_sizes = [100]  # vary over committee size, k
# group_sizes = [100]  # vary over group size, n
comm_sizes = range(100, 1201, 100)  # vary over committee size, k
group_sizes = range(100, 1201, 100)  # vary over group size, n
num_iter = 100  # Number of iterations for Monte Carlo simulation

# Note that the number of iterations here can be interpreted as the number
# of selection rounds for the committee, which we call an epoch.
# If we have a new epoch per day, then 1000 iterations is about 3 years.
In [ ]:
# %%

# Call the function
sim_results_df = simulate(
    population,
    comm_sizes,
    group_sizes,
    num_iter,
    plot_it=True,
)
Committee Size = 100
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 200
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 300
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 400
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 500
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 600
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 700
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 800
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 900
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 1000
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 1100
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
Committee Size = 1200
Group Size = 100
No description has been provided for this image
Group Size = 200
No description has been provided for this image
Group Size = 300
No description has been provided for this image
Group Size = 400
No description has been provided for this image
Group Size = 500
No description has been provided for this image
Group Size = 600
No description has been provided for this image
Group Size = 700
No description has been provided for this image
Group Size = 800
No description has been provided for this image
Group Size = 900
No description has been provided for this image
Group Size = 1000
No description has been provided for this image
Group Size = 1100
No description has been provided for this image
Group Size = 1200
No description has been provided for this image
In [ ]:
# %%

# Extract the data for plotting

col_index = sim_results_df.columns
commitee_sizes = [
    int(col.split("=")[1].strip()) for col in col_index.get_level_values(0).unique()
]
group_sizes = [
    int(col.split("=")[1].strip()) for col in col_index.get_level_values(1).unique()
]

# Plot the percentage of group participants not selected for committee seats
plot_participation(sim_results_df, commitee_sizes, group_sizes, num_iter)
No description has been provided for this image
In [ ]:
# %%

# Plot the committee selection counts distribution
fig = plt.figure(figsize=(12, 8))

plot_data = sim_results_df.loc["Committee Seats"].loc["mean"]

for c, g in plot_data.index:

    y = plot_data.loc[(c, g)]
    x = y.index

    n_c = int(c.split("=")[1].strip())
    n_g = int(g.split("=")[1].strip())

    colors = sns.color_palette("tab20", len(plot_data.index))
    color_idx = list(plot_data.index).index((c, g))
    plt.bar(x, y, alpha=0.7, color=colors[color_idx], label=f"{n_c}, {n_g}")

plt.xlabel("Participant Index")
plt.ylabel("Committee Seat Count (average)")
plt.title("Committee Seat Count for Participants")
plt.legend(title="Committee Size, Group Size")
plt.xlim(0, 200)
plt.show()
/usr/local/lib/python3.11/site-packages/IPython/core/pylabtools.py:170: UserWarning: Creating legend with loc="best" can be slow with large amounts of data.
  fig.canvas.print_figure(bytes_io, **kw)
No description has been provided for this image
In [ ]:
# %%

# Distinct Voters
committee_voters = sim_results_df.loc["Distinct Voters"]

# Create a DataFrame row from the computed percentages
mean_values = committee_voters.loc["mean"]
std_dev_values = committee_voters.loc["sd"]

# Calculate the percentage of participants not selected for committee seats
print("Percentage of Group Participants Not Selected for Committee Seats:")
committee_participation = pd.concat([mean_values, std_dev_values], axis=1)
# committee_participation.columns = ["Mean", "Std Dev"]

print(committee_participation)
Percentage of Group Participants Not Selected for Committee Seats:
                                           mean        sd
Committee Size        Group Size                         
Committee Size = 100  Group Size = 100    25.91  2.025315
                      Group Size = 200    40.26   2.55194
                      Group Size = 300    50.18  3.191802
                      Group Size = 400    57.87  3.306524
                      Group Size = 500    62.37  3.110161
...                                         ...       ...
Committee Size = 1200 Group Size = 800   223.17  5.921241
                      Group Size = 900   244.46   6.25527
                      Group Size = 1000  261.56  5.846914
                      Group Size = 1100  279.83  6.926839
                      Group Size = 1200   298.7  7.054786

[144 rows x 2 columns]
In [ ]:
# %%

# Prepare the DataFrame for concatenation with the other simulation results
committee_participation = committee_participation.T
committee_participation.index = pd.MultiIndex.from_tuples(
    [("Committee Participation %", "mean"), ("Committee Participation %", "sd")]
)

# Concatenate this new row to the simulation results DataFrame
sim_results_df = pd.concat([sim_results_df, committee_participation], axis=0)

sim_results_df
Out[ ]:
Committee Size Committee Size = 100 ... Committee Size = 1200
Group Size Group Size = 100 Group Size = 200 Group Size = 300 Group Size = 400 Group Size = 500 Group Size = 600 Group Size = 700 Group Size = 800 Group Size = 900 Group Size = 1000 ... Group Size = 300 Group Size = 400 Group Size = 500 Group Size = 600 Group Size = 700 Group Size = 800 Group Size = 900 Group Size = 1000 Group Size = 1100 Group Size = 1200
Distinct Voters mean 25.91 40.26 50.18 57.87 62.37 67.58 70.66 73.86 75.84 77.15 ... 106.02 131.7 156.54 179.34 201.49 223.17 244.46 261.56 279.83 298.7
sd 2.025315 2.55194 3.191802 3.306524 3.110161 3.672002 3.663932 3.487177 3.801894 3.235352 ... 4.14 4.670118 5.485289 5.680176 5.356295 5.921241 6.25527 5.846914 6.926839 7.054786
Committee Seats mean 0 9.41 1 9.02 2 7.95 3 7.48 4 ... 0 4.96 1 4.66 2 4.39 3 4.6... 0 3.52 1 3.26 2 2.93 3 2.8... 0 2.38 1 2.25 2 2.13 3 2.2... 0 1.99 1 1.92 2 2.09 3 1.7... 0 1.61 1 1.70 2 1.44 3 1.7... 0 1.52 1 1.54 2 1.34 3 1.2... 0 1.09 1 1.12 2 1.17 3 1.1... 0 1.02 1 1.10 2 0.94 3 1.0... 0 1.14 1 0.77 2 0.93 3 1.0... ... 0 40.01 1 37.14 2 35.08 3 ... 0 30.26 1 28.63 2 27.42 3 ... 0 24.62 1 21.44 2 22.21 3 ... 0 20.25 1 19.13 2 18.96 3 ... 0 18.26 1 16.15 2 15.52 3 ... 0 15.72 1 14.02 2 13.68 3 ... 0 13.70 1 12.88 2 12.31 3 ... 0 13.71 1 11.20 2 11.49 3 ... 0 11.90 1 10.62 2 9.77 3 ... 0 10.77 1 9.28 2 9.51 3 ...
Committee Participation % mean 25.91 40.26 50.18 57.87 62.37 67.58 70.66 73.86 75.84 77.15 ... 106.02 131.7 156.54 179.34 201.49 223.17 244.46 261.56 279.83 298.7
sd 2.025315 2.55194 3.191802 3.306524 3.110161 3.672002 3.663932 3.487177 3.801894 3.235352 ... 4.14 4.670118 5.485289 5.680176 5.356295 5.921241 6.25527 5.846914 6.926839 7.054786

5 rows × 144 columns

In [ ]:
# %%

# Save the results to an Excel file
output_file = "../data/participation_run_results.xlsx"
sim_results_df.to_excel(output_file)
print(f"Results saved to {output_file}")
Results saved to ../data/participation_run_results.xlsx